Speech retrieval with video parsing for television news programs
نویسندگان
چکیده
We have been working on speech retrieval from Chinese (Cantonese) television news programs. The use of automatic speech recognition for audio indexing produces imperfect transcriptions, and recognition errors affect retrieval performance. A news story typically contains a brief report by the anchor person(s) in the studio, as well as news footage from the field. Investigation shows that our recognizer performs better when indexing audio from the studio, compared to that from the field. In order to automatically extract the "reliable" audio segments for speech retrieval, we attempt to detect studio-to-field transitions by means of video parsing. Our study is based on 146 news stories collected from local television Cantonese news programs. We formulated a known-item retrieval task and adopted the average inverse rank (AIR) as our evaluation metric. Retrieval is performed based on syllable bigram units, augmented with skipped syllable bigrams. Retrieval using the entire audio track of each news story gave AIR=0.759. With the incorporation of video parsing, we performed retrieval based only on the studio recordings, which produced AIR=0.768.
منابع مشابه
Automatic Story Segmentation for Spoken Document Retrieval
We have been working on speech retrieval based on Cantonese television news programs. Our video archive contains over 20 hours of news programs provided by a local television station. These programs have been hand-segmented into video clips, where each clip is a self-contained news story. The audio tracks in our archive are indexed by Cantonese speech recognition. This is integrated with a vect...
متن کاملAT_TV: Broadcast Television and Radio Retrieval
This paper reports recent work at AT&T Laboratories Cambridge to develop retrieval systems for broadcast television and radio programmes. Unlike some other systems, it does not rely on manual classification or annotation of the broadcast material; it is indexed automatically from the air. While many digital video library projects focus solely on broadcast news, we have broadened our efforts to ...
متن کاملParsing video programs into individual segments using FSA modeling
Parsing video programs into program segments is useful in retrieval of individual segments and video summarization. Many video classes show structure in them that can be effectively model using Finite-State Automata (FSA). Each of the video segment such as newcaster sequence, weather sequence etc. becomes a node in FSA. The transition is fired from one node to another node based on arc conditio...
متن کاملP1: Negative Television and Memory
According to reports about 30-thousand people spent watching television had the impact on their memory and recall that the results showed no differences between men and women. The people who watched less than an hour a day did better at every memory function. As these contributors watched negative political ads, physiological responses indicated that their body was reflexively preparing to move...
متن کاملAudio-visual segmentation for content-based retrieval
This paper reports recent work at ORL on segmentation of digital audio/video recordings. Firstly, we describe an audio segmentation algorithm that partitions a soundtrack into manageably sized segments for speech recognition. Secondly, we present an algorithm for detecting camera shot-break locations in the video. The output of these two algorithms is combined to produce a semantically meaningf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001